-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Add Integrity Checker for BookTitle #13854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Integrity Checker for BookTitle #13854
Conversation
Part of JabRef#12271
Requirements1. Integrity Checker should flag a
|
|
@koppor please check if the approach fits with the expectations of the feature, and help clarify those questions. I'll start laying out some of the core logic in the meantime. |
|
Since I have not received any feedback on the approach for the last two weeks, I'll be pushing an implementation as per the outlined approach in a couple of days. |
- Add integrity checkers for each of the following fields: Year, Month, Page Range, and Location. - Update FieldCheckers to apply the checkers on the Booktitle, Journal, and Journaltitle fields. - Add location data (countries_cities1000.txt) containing country and city names. The data is sourced from - Add LocationDetector to load location data and allow location extraction from input. - Add BooktitleCleanupField enum to represent each checker field. - Add BooktitleCleanupAction enum to represent the following cleanup actions: remove only, replace, replace if empty, and skip. - Consolidate reused pre-compiled regex patterns into RegexPatterns. - Add BooktitleCleanupPanel and BooktitleCleanupPanelViewModel (inspired by FieldFormatterCleanups) to the GUI. - Add BooktitleCleanupPanel.fxml for the cleanup section's layout and integrate it into CleanupPresetPanel.fxml. - Update CleanupPreferences, CleanupPresetPanel, and CleanupWorker to include Booktitle cleanups. - Update module-info.java to export the BooktitleCleanupAction enum. - Add localization keys to JabRef_en.properties for all the new cleanup GUI text. - Add unit tests and relevant javadoc. Part of JabRef#12271
jabgui/src/main/java/org/jabref/gui/commonfxcontrols/BooktitleCleanupPanel.java
Outdated
Show resolved
Hide resolved
jablib/src/test/java/org/jabref/logic/integrity/BooktitleLocationCheckerTest.java
Show resolved
Hide resolved
jablib/src/test/java/org/jabref/logic/integrity/BooktitleMonthCheckerTest.java
Show resolved
Hide resolved
|
@koppor check if the feature implementation so far is up to expectations. Also, please help me figure out whether it requires adding a preference migration or updating preferences in the CliPreferences to have JabRef "remember" the previous choice for the cleanup panel. |
jablib/src/test/java/org/jabref/logic/integrity/BooktitlePageRangeCheckerTest.java
Show resolved
Hide resolved
jablib/src/test/java/org/jabref/logic/integrity/BooktitleYearCheckerTest.java
Show resolved
Hide resolved
I assume the functionality is existing for the other settings. Thus, please also support this for the new checkbox. Thank you 😅. |
My question wasn't a "whether or not" to add the functionality, it says "whether it requires A OR B" to be modified to have it enabled, i.e., which of the two options need to be updated to get it to work. |
- Updated CleanupPresetPanel, BooktitleCleanupPanel, and BooktitleCleanupPanelViewModel bindings to have JabRef remember the previous cleanup action selection. - Updated CleanupAction to update the BooktitleCleanups from the current cleanup preset value. - Update BooktitleCleanups to return a map of the current cleanup actions. - Update CleanupPreferences to allow setting booktitleCleanups property directly. Part of JabRef#12271
Part of JabRef#12271
|
|
||
| public CleanupPreferences(EnumSet<CleanupStep> activeJobs) { | ||
| this(activeJobs, new FieldFormatterCleanups(false, new ArrayList<>())); | ||
| this(activeJobs, new FieldFormatterCleanups(false, new ArrayList<>()), new BooktitleCleanups(false, LocationDetector.getInstance())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the wrong approach, you have to use this CleanupStep and add it tot the list of Cleanup Steps
, see also CleanupWorker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to remove the section which lets the user pick replacement options? That is, the second rectangle in this picture (same image as in the PR above)

If not, then how would you suggest managing the state for the currently selected options?
I remember that the CleanupStep enum is for simple cleanups that do not keep state. The booktitle cleanup can potentially replace the data in 4 other fields; which is why a move should be confirmed by the user beforehand. This requires keeping track of state across each of the 4 fields.
The only CleanupStep cleanup that has any state tied to the UI is this one:

This single extra checkbox state is represented as two entries within the enum:
public enum CleanupStep {
...
RENAME_PDF,
RENAME_PDF_ONLY_RELATIVE_PATHS,
...And the CleanupWorker calls the constructor separately while adding the cleanup job:
return switch (action) {
...
case RENAME_PDF ->
new RenamePdfCleanup(false, () -> databaseContext, filePreferences);
case RENAME_PDF_ONLY_RELATIVE_PATHS ->
new RenamePdfCleanup(true, () -> databaseContext, filePreferences);
...The same if-else chain exists on the GUI side in the CleanupPresetPanel:
if (cleanUpRenamePDF.isSelected()) {
if (cleanUpRenamePDFonlyRelativePaths.isSelected()) {
activeJobs.add(CleanupPreferences.CleanupStep.RENAME_PDF_ONLY_RELATIVE_PATHS);
} else {
activeJobs.add(CleanupPreferences.CleanupStep.RENAME_PDF);
}
}Adding the booktitle cleanup as a CleanupStep enum entry would require chaining similar if-else statements, but even longer (not to mention adding half a dozen entries to the enum). This is why I chose to model the cleanup after the FieldFormatterCleanup, which is also a modular cleanup that tracks state.
Clarify if this is what you want and I'll go ahead and implement it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see that you added a user choice in the UI... Not sure where the motivation comes from and how the user should decide.
Compare to the cleanup of the notes-to-doi field - it just leaves the note as is if there is already a doi. However, if the DOI is equal to the one in the notes field, the doi is removed from the notes field.
The booktitle cleanup can to the same for each extracted field.
More sophisticated it would be to show a merge dialog with the two options and then the user can decide. In the current implementation, the user should know which field is correct, which they cannot see just looking at the dialog. -- In other words: We need to think from the user side, not from the algorithmitc side.
See org.jabref.logic.cleanup.DoiCleanup for the code.
I would hope your cleanup is doing simlar things. I think, it should just be a CleanupStep without more user interaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see that you added a user choice in the UI...
The user decides, for each field:
- Remove Only: Clean up the booktitle field without moving the extracted value.
- Replace: Move the extracted value to its target field and overwrite any existing value.
- Move If Empty: Move the extracted value only if the target field is empty.
- Skip Cleanup: Skip the cleanup altogether; in case a field value (like
monthorlocation) is actually part of the booktitle.
Hopefully this clarifies how the user choices are implemented.
Not sure where the motivation comes from
The motivation comes from the original issue (screenshot attached). The idea was to give users control in cases where the extraction may or may not be desired.
Compare to the cleanup... The booktitle cleanup can to the same for each extracted field.
This is handled through option 3 above (“Move If Empty”). That effectively mirrors how the general cleanup behaves.
More sophisticated it would be to show a merge dialog...
This sounds like a good enhancement. We can also add an option to apply a default cleanup selection automatically in case a user doesn't want to go through each entry manually (similar to how a copy-paste dialog box works on windows).
I think, it should just be a CleanupStep without more user interaction.
Just to confirm: do you mean defaulting to option 3 (“Move If Empty”), removing the radio-button UI entirely, and moving the logic into the CleanupStep enum? I just want to make sure I’m interpreting your suggestion correctly before proceeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm: do you mean defaulting to option 3 (“Move If Empty”), removing the radio-button UI entirely, and moving the logic into the CleanupStep enum? I just want to make sure I’m interpreting your suggestion correctly before proceeding.
Yes.
Longer answer:
Hopefully this clarifies how the user choices are implemented.
How should the user know what to choose!
Is there a general answer or is it a case-by-case basis?
How to distinguish the cases?
Can there be more automation?
The logic in the note-cleanup shows a fully-automated approach - and the user doe snot need to decide anything.
When it comes to the mentioned "confirmation", its time for the merge dialog
| jobs.addAll(preset.getFieldFormatterCleanups().getConfiguredActions()); | ||
| } | ||
|
|
||
| if (preset.getBooktitleCleanups().isEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed if you put intot he cleanup steps
| private int maxLocationLength = 0; | ||
| private int minLocationLength = Integer.MAX_VALUE; | ||
|
|
||
| private LocationDetector() throws JabRefException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole class should be loading the data from an mv file like for example we do it with nournal abbrvatiosn or other data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into it.
I think, its both. B to have it working, A to update existing preferences to have it enabled for all users. |
Sorry for this. We do this in our freetime and have not enough freetime to cover all contributions. We try to invest more - currently by skipping sports and reducing our sleep. Since a day has only 24 hours, also there are limits. |
Thanks for the update. I completely understand the time constraints. Open source is voluntary and I appreciate the work. Just to clarify the context of my earlier ping: I wasn’t asking for a full code review (there was no code in the PR at that time). I was hoping to confirm the implementation approach before investing more time in writing code. Because this issue has had discussions spread across several years, it helps to re-establish common ground regarding the expectations and the architecture up front. As for code, the last message I sent was on September 30th, 50 days ago, to ask for help when I was stuck. That has now been answered. |
Sure. We also might have missed some messages in the dev chat. It is very OK to ask after a few weeks. It is also very OK to look into other PRs and try to give feedback to other contributors. Some need help with their submodules; this is what most contributors should be able to support. By that, our community grows, contributors get more feedback and it also decreases the delay of answers at the own project, because the whole community spends more time on the project. |
|
@TheYorouzoya Please not that we merged a PR started September, 11 also touching the GUI (#13852). Maybe, you first remove the radio buttons and then try to merge latest main to resolve the merge conflicts. -- You could also split-up the PR into two: One for the integrity check and one for the GUI. Some of us have success with GitButler, but this tool also needs some time to get used to it. Its benefit its: one sees the changes of all PRs combined - and can assign local changes to PRs. -- You can also "just" finish the PR, The license of Since we also asked for an MVStore for that, maybe, a separate PR should start with that... Because this is deep in our build pipeline and you need to get used to JBang etc. |
|
What I meant: move if empty, if equal: remove, else noop |



Closes #12271
This PR adds a new integrity checker for the booktitle field along with the associated cleanup actions.
I'll be doing the development in the following major phases (these will be refined further as I move along):
Steps to test
Mandatory checks
CHANGELOG.mdin a way that is understandable for the average user (if change is visible to the user)